Write all code in the chunks provided!

Remember to unzip to a real directory before running everything!

Question one should be roughly analogous to what we’ve done in class. There are hints at the bottom of this document if you get stuck. If you still can’t figure it out, go to google/stack exchange/ask a friend. Finally, email me or come to office hours :).

Problem 1: Piping Hot Variables

1.1 Set up your environment by: -A. Loading the tidyverse
-B. Reading the Airbnb Data (I haven’t included it in this .zip file, so you’ll have to move it into the right directory)

airbnb <- read_csv('~/OneDrive - UW/UW/Data and Society/Lab/Lab Homeworks/Homework 1/airbnb.csv') #from the tidyverse, it makes nice factors
Missing column names filled in: 'X1' [1]Parsed with column specification:
cols(
  .default = col_character(),
  X1 = col_integer(),
  id = col_integer(),
  scrape_id = col_double(),
  last_scraped = col_date(format = ""),
  host_id = col_integer(),
  host_since = col_date(format = ""),
  host_listings_count = col_integer(),
  host_total_listings_count = col_integer(),
  zipcode = col_integer(),
  latitude = col_double(),
  longitude = col_double(),
  accommodates = col_integer(),
  bathrooms = col_double(),
  bedrooms = col_integer(),
  beds = col_integer(),
  square_feet = col_integer(),
  guests_included = col_integer(),
  minimum_nights = col_integer(),
  maximum_nights = col_integer(),
  availability_30 = col_integer()
  # ... with 16 more columns
)
See spec(...) for full column specifications.
number of columns of result is not a multiple of vector length (arg 2)1 parsing failure.
row # A tibble: 1 x 5 col     row col     expected               actual file                                    expected   <int> <chr>   <chr>                  <chr>  <chr>                                   actual 1  3778 zipcode no trailing characters -3220  '~/OneDrive - UW/UW/Data and Society/L… file # A tibble: 1 x 5

1.2 Use the data to answer this question: For how many units does the host live in a different neighborhood than the listing?

1.3 Building on that work, what is the average number of listings for hosts that live in the same neighborhood as their listing and hosts who live in different neighborhoods?

airbnb %>% select(neighbourhood, host_neighbourhood)

1.4 Reflect on your answer to 1.3. What might cause the results you got? How does that connect to the idea that Airbnb might be changing neighborhoods?
Your answer should be at least a few sentences here

Problem 2: Literature Review

This question asks you to think deeply about the research question you’re investigating. Each answer should be around 100 words.

2.1: What dataset did you select (include a link agian)? Why did you select it? What is your research question? What variables do you plan to use to answer your question?

2.2: Find at least two articles (at least one must be from an academic journal) that have addressed a question similar to your own. What data did they use? What problems did they have? *If you ‘can’t find’ two articles, provide a screenshot of your search in the university library system from here: http://www.lib.washington.edu/*

2.3: What is one way that you have to modify or examine your data to begin to answer your question?

Problem 3: Pipe your own data

3.1: Using the functions we’ve worked with in class (select, filter, mutate, groupby, summarise), plus any others you’d like to use, examine the key relationship from your research question.

You must: a. Created a new dataset that only includes the variables you’re interested in b. Output a version of that dataset that only includes certain values, hopefully ones you’re interested in. c. Create a modified version of one of your variables (many of you will need to do this, but even if you don’t, I want to see that you can) d. Use groupby to group your data by one variable and see the mean (or similar) of another variable in those groups.

Use as many codeblocks as you need

Hints

1.2

Try using these steps: Step 1: identify the variables you need Listing neighborhood: neighbourhood Host’s neighborhood: host_neighbourhood

Step 2: Filter the data to only include the rows where those variables are not equal (check online if you’re not sure how to write not equal in r, remember that equals is ==, less than is <)

Step 3: How many rows are left in the filtered data?

1.3 Ignore/Don’t worry about NAs You might want to make a new variable indicating if a host is a local host (your answer to 1.2 will help here!) The variable for number of listings is host_listings_count

3.1 a. use select() b. use filter() c. use mutate() d. use groupby(var1) %>% summarise(mean = mean(var2))

LS0tCnRpdGxlOiAiU09DIDIyNSBMQUIgSG9tZXdvcmsgMiIKYXV0aG9yOiAiWU9VUiBOQU1FIEhFUkUiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KV3JpdGUgYWxsIGNvZGUgaW4gdGhlIGNodW5rcyBwcm92aWRlZCEKClJlbWVtYmVyIHRvIHVuemlwIHRvIGEgcmVhbCBkaXJlY3RvcnkgYmVmb3JlIHJ1bm5pbmcgZXZlcnl0aGluZyEKClF1ZXN0aW9uIG9uZSBzaG91bGQgYmUgcm91Z2hseSBhbmFsb2dvdXMgdG8gd2hhdCB3ZSd2ZSBkb25lIGluIGNsYXNzLiBUaGVyZSBhcmUgaGludHMgYXQgdGhlIGJvdHRvbSBvZiB0aGlzIGRvY3VtZW50IGlmIHlvdSBnZXQgc3R1Y2suIElmIHlvdSBzdGlsbCBjYW4ndCBmaWd1cmUgaXQgb3V0LCBnbyB0byBnb29nbGUvc3RhY2sgZXhjaGFuZ2UvYXNrIGEgZnJpZW5kLiBGaW5hbGx5LCBlbWFpbCBtZSBvciBjb21lIHRvIG9mZmljZSBob3VycyA6KS4KCiMjIFByb2JsZW0gMTogUGlwaW5nIEhvdCBWYXJpYWJsZXMKCjEuMSBTZXQgdXAgeW91ciBlbnZpcm9ubWVudCBieTogCi1BLiBMb2FkaW5nIHRoZSB0aWR5dmVyc2UgIAotQi4gUmVhZGluZyB0aGUgQWlyYm5iIERhdGEgKEkgaGF2ZW4ndCBpbmNsdWRlZCBpdCBpbiB0aGlzIC56aXAgZmlsZSwgc28geW91J2xsIGhhdmUgdG8gbW92ZSBpdCBpbnRvIHRoZSByaWdodCBkaXJlY3RvcnkpICAKICAKYGBge3J9CmxpYnJhcnkodGlkeXZlcnNlKQphaXJibmIgPC0gcmVhZF9jc3YoJ34vT25lRHJpdmUgLSBVVy9VVy9EYXRhIGFuZCBTb2NpZXR5L0xhYi9MYWIgSG9tZXdvcmtzL0hvbWV3b3JrIDEvYWlyYm5iLmNzdicpICNmcm9tIHRoZSB0aWR5dmVyc2UsIGl0IG1ha2VzIG5pY2UgZmFjdG9ycwpgYGAKICAKMS4yIFVzZSB0aGUgZGF0YSB0byBhbnN3ZXIgdGhpcyBxdWVzdGlvbjogRm9yIGhvdyBtYW55IHVuaXRzIGRvZXMgdGhlIGhvc3QgbGl2ZSBpbiBhIGRpZmZlcmVudCBuZWlnaGJvcmhvb2QgdGhhbiB0aGUgbGlzdGluZz8gIApgYGB7cn0KYWlyYm5iICU+JSBtdXRhdGUoaG9zdF9uZWlnaGJvdXJob29kID0gaWZlbHNlKGlzLm5hKGhvc3RfbmVpZ2hib3VyaG9vZCksJ25vbmUnLGhvc3RfbmVpZ2hib3VyaG9vZCkpICU+JSBmaWx0ZXIobmVpZ2hib3VyaG9vZCA9PSBob3N0X25laWdoYm91cmhvb2QpCmBgYAogIAoxLjMgQnVpbGRpbmcgb24gdGhhdCB3b3JrLCB3aGF0IGlzIHRoZSBhdmVyYWdlIG51bWJlciBvZiBsaXN0aW5ncyBmb3IgaG9zdHMgdGhhdCBsaXZlIGluIHRoZSBzYW1lIG5laWdoYm9yaG9vZCBhcyB0aGVpciBsaXN0aW5nIGFuZCBob3N0cyB3aG8gbGl2ZSBpbiBkaWZmZXJlbnQgbmVpZ2hib3Job29kcz8gCiAgCmBgYHtyfQphaXJibmIgJT4lIHNlbGVjdChuZWlnaGJvdXJob29kLCBob3N0X25laWdoYm91cmhvb2QpCmBgYAogIAoxLjQgUmVmbGVjdCBvbiB5b3VyIGFuc3dlciB0byAxLjMuIFdoYXQgbWlnaHQgY2F1c2UgdGhlIHJlc3VsdHMgeW91IGdvdD8gSG93IGRvZXMgdGhhdCBjb25uZWN0IHRvIHRoZSBpZGVhIHRoYXQgQWlyYm5iIG1pZ2h0IGJlIGNoYW5naW5nIG5laWdoYm9yaG9vZHM/ICAgCiAgKllvdXIgYW5zd2VyIHNob3VsZCBiZSBhdCBsZWFzdCBhIGZldyBzZW50ZW5jZXMgaGVyZSoKCiMjIFByb2JsZW0gMjogTGl0ZXJhdHVyZSBSZXZpZXcKKlRoaXMgcXVlc3Rpb24gYXNrcyB5b3UgdG8gdGhpbmsgZGVlcGx5IGFib3V0IHRoZSByZXNlYXJjaCBxdWVzdGlvbiB5b3UncmUgaW52ZXN0aWdhdGluZy4gRWFjaCBhbnN3ZXIgc2hvdWxkIGJlIGFyb3VuZCAxMDAgd29yZHMuKgoKMi4xOiBXaGF0IGRhdGFzZXQgZGlkIHlvdSBzZWxlY3QgKGluY2x1ZGUgYSBsaW5rIGFnaWFuKT8gV2h5IGRpZCB5b3Ugc2VsZWN0IGl0PyBXaGF0IGlzIHlvdXIgcmVzZWFyY2ggcXVlc3Rpb24/IFdoYXQgdmFyaWFibGVzIGRvIHlvdSBwbGFuIHRvIHVzZSB0byBhbnN3ZXIgeW91ciBxdWVzdGlvbj8KCjIuMjogRmluZCBhdCBsZWFzdCB0d28gYXJ0aWNsZXMgKGF0IGxlYXN0IG9uZSBtdXN0IGJlIGZyb20gYW4gYWNhZGVtaWMgam91cm5hbCkgdGhhdCBoYXZlIGFkZHJlc3NlZCBhIHF1ZXN0aW9uIHNpbWlsYXIgdG8geW91ciBvd24uIFdoYXQgZGF0YSBkaWQgdGhleSB1c2U/IFdoYXQgcHJvYmxlbXMgZGlkIHRoZXkgaGF2ZT8gKklmIHlvdSAnY2FuJ3QgZmluZCcgdHdvIGFydGljbGVzLCBwcm92aWRlIGEgc2NyZWVuc2hvdCBvZiB5b3VyIHNlYXJjaCBpbiB0aGUgdW5pdmVyc2l0eSBsaWJyYXJ5IHN5c3RlbSBmcm9tIGhlcmU6IGh0dHA6Ly93d3cubGliLndhc2hpbmd0b24uZWR1LyoKCjIuMzogV2hhdCBpcyBvbmUgd2F5IHRoYXQgeW91IGhhdmUgdG8gbW9kaWZ5IG9yIGV4YW1pbmUgeW91ciBkYXRhIHRvIGJlZ2luIHRvIGFuc3dlciB5b3VyIHF1ZXN0aW9uPwoKIyMgUHJvYmxlbSAzOiBQaXBlIHlvdXIgb3duIGRhdGEKCjMuMTogVXNpbmcgdGhlIGZ1bmN0aW9ucyB3ZSd2ZSB3b3JrZWQgd2l0aCBpbiBjbGFzcyAoc2VsZWN0LCBmaWx0ZXIsIG11dGF0ZSwgZ3JvdXBieSwgc3VtbWFyaXNlKSwgcGx1cyBhbnkgb3RoZXJzIHlvdSdkIGxpa2UgdG8gdXNlLCBleGFtaW5lIHRoZSBrZXkgcmVsYXRpb25zaGlwIGZyb20geW91ciByZXNlYXJjaCBxdWVzdGlvbi4KCllvdSBtdXN0OgphLiBDcmVhdGVkIGEgbmV3IGRhdGFzZXQgdGhhdCBvbmx5IGluY2x1ZGVzIHRoZSB2YXJpYWJsZXMgeW91J3JlIGludGVyZXN0ZWQgaW4KYi4gT3V0cHV0IGEgdmVyc2lvbiBvZiB0aGF0IGRhdGFzZXQgdGhhdCBvbmx5IGluY2x1ZGVzIGNlcnRhaW4gdmFsdWVzLCBob3BlZnVsbHkgb25lcyB5b3UncmUgaW50ZXJlc3RlZCBpbi4KYy4gQ3JlYXRlIGEgbW9kaWZpZWQgdmVyc2lvbiBvZiBvbmUgb2YgeW91ciB2YXJpYWJsZXMgKG1hbnkgb2YgeW91IHdpbGwgKm5lZWQqIHRvIGRvIHRoaXMsIGJ1dCBldmVuIGlmIHlvdSBkb24ndCwgSSB3YW50IHRvIHNlZSB0aGF0IHlvdSBjYW4pCmQuIFVzZSBncm91cGJ5IHRvIGdyb3VwIHlvdXIgZGF0YSBieSBvbmUgdmFyaWFibGUgYW5kIHNlZSB0aGUgbWVhbiAob3Igc2ltaWxhcikgb2YgYW5vdGhlciB2YXJpYWJsZSBpbiB0aG9zZSBncm91cHMuCgoqVXNlIGFzIG1hbnkgY29kZWJsb2NrcyBhcyB5b3UgbmVlZCoKICAKYGBge3J9CgpgYGAKCgojI0hpbnRzCjEuMgoKVHJ5IHVzaW5nIHRoZXNlIHN0ZXBzOgpTdGVwIDE6IGlkZW50aWZ5IHRoZSB2YXJpYWJsZXMgeW91IG5lZWQKTGlzdGluZyBuZWlnaGJvcmhvb2Q6IG5laWdoYm91cmhvb2QKSG9zdCdzIG5laWdoYm9yaG9vZDogaG9zdF9uZWlnaGJvdXJob29kCgpTdGVwIDI6IEZpbHRlciB0aGUgZGF0YSB0byBvbmx5IGluY2x1ZGUgdGhlIHJvd3Mgd2hlcmUgdGhvc2UgdmFyaWFibGVzIGFyZSBub3QgZXF1YWwgKGNoZWNrIG9ubGluZSBpZiB5b3UncmUgbm90IHN1cmUgaG93IHRvIHdyaXRlIG5vdCBlcXVhbCBpbiByLCByZW1lbWJlciB0aGF0IGVxdWFscyBpcyA9PSwgbGVzcyB0aGFuIGlzIDwpCgpTdGVwIDM6IEhvdyBtYW55IHJvd3MgYXJlIGxlZnQgaW4gdGhlIGZpbHRlcmVkIGRhdGE/CgoxLjMKSWdub3JlL0Rvbid0IHdvcnJ5IGFib3V0IE5BcwpZb3UgbWlnaHQgd2FudCB0byBtYWtlIGEgbmV3IHZhcmlhYmxlIGluZGljYXRpbmcgaWYgYSBob3N0IGlzIGEgbG9jYWwgaG9zdCAoeW91ciBhbnN3ZXIgdG8gMS4yIHdpbGwgaGVscCBoZXJlISkKVGhlIHZhcmlhYmxlIGZvciBudW1iZXIgb2YgbGlzdGluZ3MgaXMgaG9zdF9saXN0aW5nc19jb3VudAoKMy4xCmEuIHVzZSBzZWxlY3QoKQpiLiB1c2UgZmlsdGVyKCkKYy4gdXNlIG11dGF0ZSgpCmQuIHVzZSBncm91cGJ5KHZhcjEpICU+JSBzdW1tYXJpc2UobWVhbiA9IG1lYW4odmFyMikpCg==